-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify bitmasks #375
Simplify bitmasks #375
Conversation
75943f7
to
4ca9f04
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was able to remove the
ToBitMask
andToBitMaskArray
traits and remove the complex trait bounds by making these two changes:
to_bitmask
always returnsu64
. For larger vectors this truncates to 64 lanes.
This is likely less efficient, since LLVM has to figure out that most of the 64-element vector is zero and thereby it can shrink the types...seems iffy.
to_bitmask_array
is nowto_bitmask_vector
and returnsSimd<T, N>
forMask<T, N>
. This of course wastes bytes (in memory), but simplifies the function signature.
I don't think removing to_bitmask_array
is a good idea, since it's much easier to use when you need the bitmask to directly inspect/modify. Also, it potentially wastes much less memory.
I'm ok with also having to_bitmask_vector
...
This also makes it easier to do parallel operations on the entire byte array, which is a common use-case for bitmasks anyway.
seems less likely, since most the parallel operations you'd want are already covered by Mask
...
I haven't read through the code, but from the description, I'm very much in favor of this simplification!
I think LLVM is generally very good at this sort of range optimization, FWIW. I may be misunderstanding though, and it might be worth checking? |
LLVM is generally good with zero-elements in vectors, but if we like this API I intend on modifying the simd_bitmask intrinsic to instead allow zero-extending the integer result. Padding the vector is just a workaround until that's done. Regarding having access to the bytes, you can always call to_ne_bytes on the resulting vector, and copy it into an array to save memory. Maybe cumbersome for that specific usage but saves a lot on the API. ToBitMaskArray, with its associated type and bounds, is a bit much for a single function IMO, especially since most people will typically want the u64. This also isn't precluding an array implementation in the future, especially if const generics get better. |
not in this case: |
If we like the API I can update the intrinsic,
Ugh, I've definitely seen it do better than that. I added a new workaround that doesn't require padding the vectors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
other than that, looks good enough for me.
GitHub apparently never posted my review comment that |
Oh, yeah, I'll add that. |
I was able to remove the
ToBitMask
andToBitMaskArray
traits and remove the complex trait bounds by making these two changes:to_bitmask
always returnsu64
. For larger vectors this truncates to 64 lanes.to_bitmask_array
is nowto_bitmask_vector
and returnsSimd<T, N>
forMask<T, N>
. This of course wastes bytes (in memory), but simplifies the function signature. This also makes it easier to do parallel operations on the entire byte array, which is a common use-case for bitmasks anyway.